22 research outputs found

    The shaky foundations of simulating single-cell RNA sequencing data

    Full text link
    BACKGROUND: With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant-on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data. RESULTS: Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity. CONCLUSIONS: Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons

    distinct: a novel approach to differential distribution analyses

    Get PDF
    We present distinct, a general method for differential analysis of full distributions that is well suited to applications on single-cell data, such as single-cell RNA sequencing and high-dimensional flow or mass cytometry data. High-throughput single-cell data reveal an unprecedented view of cell identity and allow complex variations between conditions to be discovered; nonetheless, most methods for differential expression target differences in the mean and struggle to identify changes where the mean is only marginally affected. distinct is based on a hierarchical non-parametric permutation ap- proach and, by comparing empirical cumulative distribution functions, iden- tifies both differential patterns involving changes in the mean, as well as more subtle variations that do not involve the mean. We performed extensive bench- marks across both simulated and experimental datasets from single-cell RNA sequencing and mass cytometry data, where distinct shows favourable per- formance, identifies more differential patterns than competitors, and displays good control of false positive and false discovery rates. distinct is available as a Bioconductor R package

    An R-based reproducible and user-friendly preprocessing pipeline for CyTOF data

    Full text link
    Mass cytometry (CyTOF) has become a method of choice for in-depth characterization of tissue heterogeneity in health and disease, and is currently implemented in multiple clinical trials, where higher quality standards must be met. Currently, preprocessing of raw files is commonly performed in independent standalone tools, which makes it difficult to reproduce. Here, we present an R pipeline based on an updated version of CATALYST that covers all preprocessing steps required for downstream mass cytometry analysis in a fully reproducible way. This new version of CATALYST is based on Bioconductor’s SingleCellExperiment class and fully unit tested. The R-based pipeline includes file concatenation, bead-based normalization, single-cell deconvolution, spillover compensation and live cell gating after debris and doublet removal. Importantly, this pipeline also includes different quality checks to assess machine sensitivity and staining performance while allowing also for batch correction. This pipeline is based on open source R packages and can be easily be adapted to different study designs. It therefore has the potential to significantly facilitate the work of CyTOF users while increasing the quality and reproducibility of data generated with this technology

    CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets [version 3; peer review: 2 approved]

    Get PDF
    High-dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high-throughput interrogation and characterization of cell populations. Here, we present an updated R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signalling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models or linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across samples to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g., multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g., plots of aggregated signals)

    Meta-analysis of (single-cell method) benchmarks reveals the need for extensibility and interoperability

    Full text link
    Computational methods represent the lifeblood of modern molecular biology. Benchmarking is important for all methods, but with a focus here on computational methods, benchmarking is critical to dissect important steps of analysis pipelines, formally assess performance across common situations as well as edge cases, and ultimately guide users on what tools to use. Benchmarking can also be important for community building and advancing methods in a principled way. We conducted a meta-analysis of recent single-cell benchmarks to summarize the scope, extensibility, and neutrality, as well as technical features and whether best practices in open data and reproducible research were followed. The results highlight that while benchmarks often make code available and are in principle reproducible, they remain difficult to extend, for example, as new methods and new ways to assess methods emerge. In addition, embracing containerization and workflow systems would enhance reusability of intermediate benchmarking results, thus also driving wider adoption

    Hand2 delineates mesothelium progenitors and is reactivated in mesothelioma.

    Get PDF
    The mesothelium lines body cavities and surrounds internal organs, widely contributing to homeostasis and regeneration. Mesothelium disruptions cause visceral anomalies and mesothelioma tumors. Nonetheless, the embryonic emergence of mesothelia remains incompletely understood. Here, we track mesothelial origins in the lateral plate mesoderm (LPM) using zebrafish. Single-cell transcriptomics uncovers a post-gastrulation gene expression signature centered on hand2 in distinct LPM progenitor cells. We map mesothelial progenitors to lateral-most, hand2-expressing LPM and confirm conservation in mouse. Time-lapse imaging of zebrafish hand2 reporter embryos captures mesothelium formation including pericardium, visceral, and parietal peritoneum. We find primordial germ cells migrate with the forming mesothelium as ventral migration boundary. Functionally, hand2 loss disrupts mesothelium formation with reduced progenitor cells and perturbed migration. In mouse and human mesothelioma, we document expression of LPM-associated transcription factors including Hand2, suggesting re-initiation of a developmental program. Our data connects mesothelium development to Hand2, expanding our understanding of mesothelial pathologies

    Hand2 delineates mesothelium progenitors and is reactivated in mesothelioma

    Full text link
    The mesothelium lines body cavities and surrounds internal organs, widely contributing to homeostasis and regeneration. Mesothelium disruptions cause visceral anomalies and mesothelioma tumors. Nonetheless, the embryonic emergence of mesothelia remains incompletely understood. Here, we track mesothelial origins in the lateral plate mesoderm (LPM) using zebrafish. Single-cell transcriptomics uncovers a post-gastrulation gene expression signature centered on hand2 in distinct LPM progenitor cells. We map mesothelial progenitors to lateral-most, hand2-expressing LPM and confirm conservation in mouse. Time-lapse imaging of zebrafish hand2 reporter embryos captures mesothelium formation including pericardium, visceral, and parietal peritoneum. We find primordial germ cells migrate with the forming mesothelium as ventral migration boundary. Functionally, hand2 loss disrupts mesothelium formation with reduced progenitor cells and perturbed migration. In mouse and human mesothelioma, we document expression of LPM-associated transcription factors including Hand2, suggesting re-initiation of a developmental program. Our data connects mesothelium development to Hand2, expanding our understanding of mesothelial pathologies

    muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data

    Full text link
    Single-cell RNA sequencing (scRNA-seq) has become an empowering technology to profile the transcriptomes of individual cells on a large scale. Early analyses of differential expression have aimed at identifying differences between subpopulations to identify subpopulation markers. More generally, such methods compare expression levels across sets of cells, thus leading to cross-condition analyses. Given the emergence of replicated multi-condition scRNA-seq datasets, an area of increasing focus is making sample-level inferences, termed here as differential state analysis; however, it is not clear which statistical framework best handles this situation. Here, we surveyed methods to perform cross-condition differential state analyses, including cell-level mixed models and methods based on aggregated pseudobulk data. To evaluate method performance, we developed a flexible simulation that mimics multi-sample scRNA-seq data. We analyzed scRNA-seq data from mouse cortex cells to uncover subpopulation-specific responses to lipopolysaccharide treatment, and provide robust tools for multi-condition analysis within the muscat R package

    The shaky foundations of simulating single-cell RNA sequencing data

    No full text
    Abstract Background With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant—on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data. Results Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity. Conclusions Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons

    SpatialExperiment: infrastructure for spatially resolved transcriptomics data in R using Bioconductor

    Get PDF
    Summary: SpatialExperiment is a new data infrastructure for storing and accessing spatially resolved transcriptomics data, implemented within the R/Bioconductor framework, which provides advantages of modularity, interoperability, standardized operations, and comprehensive documentation. Here, we demonstrate the structure and user interface with examples from the 10x Genomics Visium and seqFISH platforms, and provide access to example datasets and visualization tools in the STexampleData, TENxVisiumData, and ggspavis packages. Availability and implementation: The SpatialExperiment, STexampleData, TENxVisiumData, and ggspavis packages are available from Bioconductor. The package versions described in this manuscript are available in Bioconductor version 3.15 onwards. Supplementary information: Supplementary tables and figures are available at Bioinformatics online
    corecore